A Feature Weight Adjustment Algorithm for Document Categorization
نویسندگان
چکیده
In recent years we have seen a tremendous growth in the volume of text documents available on the Internet, digital libraries, news sources, and company-wide intra-nets. Automatic text categorization, which is the task of assigning text documents to pre-speci ed classes (topics or themes) of documents, is an important task that can help both in organizing as well as in nding information on these huge resources. In this paper we present a fast iterative feature weight adjustment algorithm for the linear-complexity centroid based classi cation algorithm. Our algorithm uses a measure of the discriminating power of each term to gradually adjust the weights of all features concurrently. We experimentally evaluate our algorithm on the Reuters-21578 and OHSUMED document collection and compare it against a variety of other categorization algorithms. Our experiments show that feature weight adjustment improves the performance of the centroid-based classi er by 2%{5% , substantially outperforms Rocchio andWidrow-Ho and is competitive with SVM.
منابع مشابه
Weight Adjustment Schemes for a Centroid Based Classifier Weight Adjustment Schemes for a Centroid Based Classifier Weight Adjustment Schemes for a Centroid Based Classifier *
In recent years we have seen a tremendous growth in the volume of text documents available on the Internet, digital libraries, news sources, and company-wide intra-nets. Automatic text categorization, which is the task of assigning text documents to pre-specified classes (topics or themes) of documents, is an important task that can help both in organizing as well as in finding information on t...
متن کاملWeight adjustment schemes for a centroid based classifier ∗
In recent years we have seen a tremendous growth in the volume of text documents available on the Internet, digital libraries, news sources, and company-wide intra-nets. Automatic text categorization, which is the task of assigning text documents to pre-specified classes (topics or themes) of documents, is an important task that can help both in organizing as well as in finding information on t...
متن کاملText Categorization Using Weight Adjusted k-Nearest Neighbor Classification
Text categorization is the task of deciding whether a document belongs to a set of prespecified classes of documents. Automatic classification schemes can greatly facilitate the process of categorization. Categorization of documents is challenging, as the number of discriminating words can be very large. Many existing algorithms simply would not work with these many number of features. k-neares...
متن کاملCluster Based Hybrid Niche Mimetic and Genetic Algorithm for Text Document Categorization
An efficient cluster based hybrid niche mimetic and genetic algorithm for text document categorization to improve the retrieval rate of relevant document fetching is addressed. The proposal minimizes the processing of structuring the document with better feature selection using hybrid algorithm. In addition restructuring of feature words to associated documents gets reduced, in turn increases d...
متن کاملClass-Based Weighted NB for Text Categorization
Naïve Bayes classifier is a supervised and probabilistic learning method (Manning, Raghavan, & Schuetze, 2008) which greatly simplifies learning by making the assumption that provided features are conditionally independent. Although this assumption usually does not hold, this classifier proves to compete well with other more sophisticated techniques (Rish, 2001). Moreover, being fast and easy t...
متن کامل